skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Zhou, Yang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Linear Autoencoders (LAEs) have shown strong performance in state-of-the-art recommender systems. However, this success remains largely empirical, with limited theoretical understanding. In this paper, we investigate the generalizability – a theoretical measure of model performance in statistical learning – of multivariate linear regression and LAEs. We first propose a PAC-Bayes bound for multivariate linear regression, extending the earlier bound for single-output linear regression by Shalaeva et al. [45], and establish sufficient conditions for its convergence. We then show that LAEs, when evaluated under a relaxed mean squared error, can be interpreted as constrained multivariate linear regression models on bounded data, to which our bound adapts. Furthermore, we develop theoretical methods to improve the computational efficiency of optimizing the LAE bound, enabling its practical evaluation on large models and real-world datasets. Experimental results demonstrate that our bound is tight and correlates well with practical ranking metrics such as Recall@K and NDCG@K. 
    more » « less
  2. Abstract When do host governments protect migrants and expand their rights? On 8 February 2021, Colombian President Iván Duque announced a 10-year temporary protected status for over 1.7 million Venezuelan migrants, a policy shift contrasting with more restrictive global migration responses. We offer three motivations for Colombia’s liberalization of migration policy: the pragmatic response to challenges in border control, the economic and legibility benefits of migrant regularization, and the pursuit of international reputation gains. Drawing on interviews with thirty Colombian policymakers, politicians, diplomats, bureaucrats, and NGO leaders, this study offers insights into the drivers of inclusive migration policies in the Global South. 
    more » « less
  3. Machine unlearning (MU) aims to remove the influence of specific data points from trained models, enhancing compliance with privacy regulations. However, the vulnerability of basic MU models to malicious unlearning requests in adversarial learning environments has been largely overlooked. Existing adversarial MU attacks suffer from three key limitations: inflexibility due to pre-defined attack targets, inefficiency in handling multiple attack requests, and instability caused by non-convex loss functions. To address these challenges, we propose a Flexible, Efficient, and Stable Attack (DDPA). First, leveraging Carathéodory's theorem, we introduce a convex polyhedral approximation to identify points in the loss landscape where convexity approximately holds, ensuring stable attack performance. Second, inspired by simplex theory and John's theorem, we develop a regular simplex detection technique that maximizes coverage over the parameter space, improving attack flexibility and efficiency. We theoretically derive the proportion of the effective parameter space occupied by the constructed simplex. We evaluate the attack success rate of our DDPA method on real datasets against state-of-the-art machine unlearning attack methods. Our source code is available at https://github.com/zzz0134/DDPA. 
    more » « less
  4. Machine unlearning (MU) aims to remove the influence of specific data points from trained models, enhancing compliance with privacy regulations. However, the vulnerability of basic MU models to malicious unlearning requests in adversarial learning environments has been largely overlooked. Existing adversarial MU attacks suffer from three key limitations: inflexibility due to pre-defined attack targets, inefficiency in handling multiple attack requests, and instability caused by non-convex loss functions. To address these challenges, we propose a Flexible, Efficient, and Stable Attack (DDPA). First, leveraging Carathéodory's theorem, we introduce a convex polyhedral approximation to identify points in the loss landscape where convexity approximately holds, ensuring stable attack performance. Second, inspired by simplex theory and John's theorem, we develop a regular simplex detection technique that maximizes coverage over the parameter space, improving attack flexibility and efficiency. We theoretically derive the proportion of the effective parameter space occupied by the constructed simplex. We evaluate the attack success rate of our DDPA method on real datasets against state-of-the-art machine unlearning attack methods. Our source code is available at https://github.com/zzz0134/DDPA. 
    more » « less
  5. The accurate and prompt mapping of flood-affected regions is important for effective disaster management, including damage assessment and relief efforts. While high-resolution optical imagery from satellites during disasters presents an opportunity for automated flood inundation mapping, existing segmentation models face challenges due to noises such as cloud cover and tree canopies. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), terrain guidance was utilized by recent graphical models such as hidden Markov trees (HMTs) to improve segmentation quality. Unfortunately, these methods either can only handle a small area where water levels at different locations are assumed to be consistent or require restricted assumptions such as there is only one river channel. This article presents an algorithm for flood extent mapping on large-scale Earth imagery, applicable to a large geographic area with multiple river channels. Since water level can vary a lot from upstream to downstream, we propose to detect river pixels to partition the remaining pixels into localized zones, each with a unique water level. In each zone, water at all locations flows to the same river entry point. Pixels in each zone are organized by an HMT to capture water flow directions guided by elevations. Moreover, a novel regularization scheme is designed to enforce inter-zone consistency by penalizing pixel-pairs of adjacent zones that violate terrain guidance. Efficient parallelization is made possible by coloring the zone adjacency graph to identify zones and zone-pairs that have no dependency and hence can be processed in parallel, and incremental one-pass terrain-guided scanning is conducted wherever applicable to reuse computations. Experiments demonstrate that our solution is more accurate than existing solutions and can efficiently and accurately map out flooding pixels in a giant area of size 24,805 × 40,129. Despite the imbalanced workloads caused by a few large zonal HMTs dominating the serial computing time, our parallelization approach is effective and manages to achieve up to 14.3× speedup on a machine with Intel Xeon Gold 6126 CPU @ 2.60 GHz (24 cores, 48 threads) using 32 threads. 
    more » « less